Stall-Time Fair Memory Access Scheduling (STFM): Enabling Fair and High-Throughput Sharing of Chip Multiprocessor DRAM Systems

نویسندگان

  • Onur Mutlu
  • Thomas Moscibroda
چکیده

In a chip multiprocessor (CMP) system, where multiple on-chip cores share a common memory interface, simultaneous memory requests from different threads can interfere with each other. Unfortunately, conventional memory scheduling techniques only try to optimize for overall data throughput and do not account for this inter-thread interference. Therefore, different threads running concurrently on the same chip can experience extremely different memory system performance: one thread can experience a severe slowdown or starvation while another is unfairly prioritized by the memory scheduler. Our MICRO-40 paper proposes a new memory access scheduler, called the Stall-Time Fair Memory scheduler (STFM), that provides performance fairness to threads sharing the DRAM memory system. The key idea of the proposed scheduler is to equalize the DRAM-related slowdown experienced by each thread due to interference from other threads, without hurting overall system performance. To do so, STFM estimates at run-time each thread’s slowdown due to sharing the DRAM system and prioritizes memory requests of threads that are slowed down the most. Unlike previous approaches to DRAM scheduling, STFM comprehensively takes into account inherent memory characteristics of each thread and therefore does not unfairly penalize threads that use the DRAM system without interfering with others. We show how STFM can be configured by the system software to control unfairness and to enforce thread priorities. Our results show that STFM significantly reduces the unfairness in the DRAM system while also improving system throughput on a wide variety of workloads and CMP systems. For example, averaged over 32 different workloads running on an 8-core CMP, the ratio between the highest DRAM-related slowdown and the lowest DRAM-related slowdown reduces from 5.26X to 1.4X, while system throughput improves by 7.6%. We qualitatively and quantitatively compare STFM to one new and three previously-proposed memory access scheduling algorithms, including Network Fair Queueing. Our results show that STFM provides the best fairness, system throughput, and scalability.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhancing the Performance and Fairness of Shared DRAM Systems with Parallelism-Aware Batch Scheduling

Enhancing the Performance and Fairness of Shared DRAM Systems with Parallelism-Aware Batch Scheduling Onur Mutlu Thomas Moscibroda Microsoft Research Abstract In a chip-multiprocessor (CMP) system, the DRAM system is shared among cores. In a shared DRAM system, requests from a thread can not only delay requests from other threads by causing bank/bus/row-buffer conflicts but they can also destro...

متن کامل

A Fair Thread-Aware Memory Scheduling Algorithm for Chip Multiprocessor

In Chip multiprocessor (CMP) systems, DRAM memory is a critical resource shared among cores. Scheduled by one single memory controller, memory access requests from different cores may interfere with each other. This interference causes extra waiting time for threads and leads to negligible overall system performance loss. In conventional thread-unaware memory scheduling patterns, different thre...

متن کامل

Priority Based Fair Scheduling: A Memory Scheduler Design for Chip-Multiprocessor Systems

Memory is commonly a shared resource for a modern chip-multiprocessor system. Concurrently running threads have different memory access behaviors and compete for memory resources. A memory scheduling algorithms should be designed to arbitrate memory requests from different threads, provide high system throughput as well as fairness. This work proposes a memory scheduling algorithm, PriorityBase...

متن کامل

Buffer Aggregation: Addressing Queuing Subsystem Bottlenecks at High Speeds

Modern routers and switch fabrics can have hundreds of input and output ports running at up to 10 Gb/s; 40 Gb/s systems are starting to appear. At these rates, the performance of the buffering and queuing subsystem becomes a significant bottleneck. In high performance routers with more than a few queues, packet buffering is typically implemented using DRAM for data storage and a combination of ...

متن کامل

Uncontrolled Interthread Interference in Main Memory Can Destroy Individ- Ual Threads’ Memory-level Parallelism, Effectively Serializing the Memory Requests of a Thread Whose Latencies Would Otherwise Have Largely

......The main memory (dynamic RAM) system is a major limiter of computer system performance. In modern processors, which are overwhelmingly multicore (or multithreaded), the concurrently executing threads share the DRAM system, and different threads running on different cores can delay each other through resource contention. One thread’s memory requests can cause DRAM bank conflicts, row-buffe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007